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Abstract 

We present a domain-independent algorithm that computes macros 
in a novel way. Our algorithm computes macros "on-the-fly" for a 
given set of states and does not require previously learned or inferred 
information, nor prior domain knowledge. The algorithm is used to 
define new domain-independent tractable classes of classical planning 
that are proved to include Blocksworld-arm and Towers of Hanoi. 



1 Introduction 

Macros have long been studied in AI planning [9, 18]. Many domain-dependent ap- 
plications of macros have been exhibited and studied [15, 17, 12]; also, a number of 
domain-independent methods for learning, inferring, filtering, and applying macros 
have been the topic of research continuing up to the present [2, 7, 20]. 

In this paper, we present a domain-independent algorithm that computes macros 
in a novel way. Our algorithm computes macros "on-the-fly" for a given set of states 
and does not require previously learned or inferred information, nor does it need any 
prior domain knowledge. We exhibit the power of our algorithm by using it to de- 
fine new domain-independent tractable classes of classical planning that strictly extend 
previously defined such classes [6], and can be proved to include Blocksworld-arm 
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and Towers of Hanoi. We believe that this is notable as theoretically defined, domain- 
independent tractable classes have generally struggled to incorporate construction-type 
domains such as these two. We hence give theoretically grounded evidence of the 
computational value of macros in planning. 

Our algorithm. Consider the following reachability problem: given an instance of 

planning and a set S of states, compute the ordered pairs of states {s,t) G S x S such 
that the second state t is reachable from the first state s. (By reachable, we mean that 
there is a sequence of operators that transforms the first state into the second.) This 
problem is clearly hard in general, as deciding if one state is reachable from another 
captures the complexity of planning itself. 

A natural-albeit incomplete-algorithm for solving this reachability problem is to 
first compute the pairs {s,t) G S x S such that the state t is reachable from the state s 
by application of a single operator, and then to compute the transitive closure of these 
pairs. This algorithm is well-known to run in polynomial time (in the number of states 
and the size of the instance) but will only discover pairs for which the reachability is 
evidenced by plans staying within the set of states S: the algorithm is efficient but 
incomplete. 

The algorithm that we introduce is a strict generalization of this transitive closure 
algorithm for the described reachabiUty problem. We now turn to a brief, high-level 
description of our algorithm. Our algorithm begins by computing the pairs connected 
by a single operator, as in the just-described algorithm, but each pair is labelled with its 
coimecting operator. The algorithm then continually applies two types of transforma- 
tions to the current set of pairs until a fixed point is reached. Throughout the execution 
of the algorithm, every pair has an associated label which is either a single operator or 
a macro derived by combining existing labels. The first type of transformation (which 
is similar to the transitive closure) is to take pairs of states having the form (si, S2), 
(s2, S3) and to add the pair (si, S3) whose new label is the macro obtained by "con- 
catenating" the labels of the pairs (si, S2) and (s2, S3). If the pair (si, S3) is already 
contained in the current set, the algorithm replaces the label of (si, S3) with the new 
label if the new label is "more general" than the old one.^ The second type of trans- 
formation is to take a state s G S and a label of an existing pair, and to see if the 
label applied to s yields a state t G S;if so, the pair (s, t) is introduced, and the same 
replacement procedure as before is invoked if the pair (s, t) is already present. 

Our algorithm, as with the transitive closure, operates in polynomial time (as proved 
in the paper) and is incomplete. We want to emphasize that it can, in general, identify 
pairs that are not identified by the transitive closure algorithm. Why is this? Certainly, 
some state pairs (s, t) introduced by the first type of transformation have macro labels 
that, if executed one operator at a time, would stay within the set S, and hence are pairs 
that are discovered by the transitive closure algorithm. However, the second type of 
transformation may apply such a macro to other states to discover pairs {s,t) G S x S 
that would not be discovered by the transitive closure: this occurs when a step-by-step 
exccLilion of Ihc macro, starling from s, would leave the set S before arriving to t. 

' For the precise definitions of "concatenation" and "more general", please refer to the technical sections 
of the paper. 
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Indeed, these two transformations depend on and feed off of each other: the first trans- 
formation introduces increasingly powerful macros, which in turn can be used by the 
second to increase the set of pairs, which in turn permits the first to derive yet more 
powerful macros, and so forth. 

We now describe two concrete results to offer the reader a feel for the power of 
our algorithm. Let s be any state of a Blocksworld-arm instance, and let S be the set 
H{s, 4) of states within Hamming distance 4 of s} Let us use the term subtower to 
refer to a sequence of blocks stacked on top of one another such that the top is clear. We 
prove that our algorithm, given the set S, will discover macros that move any subtower 
of s onto the ground (preserving the subtower structure). As another result, let s be the 
initial state of the Towers of Hanoi problem, for any number of discs; and, let S be the 
set H{s, 7) of states within Hamming distance 7 of s. We prove that our algorithm, 
given the set S, will discover macros that, starting from the state s, move any subtower 
of discs from the initial peg to either of the other pegs. In particular, our algorithm 
will report that the goal state is reachable from the initial state s. Note that, in the case 
of Blocksworld-arm, the constant 4 is independent of the state s, and in particular is 
independent of the height of subtowers; likewise, in Towers of Hanoi, the constant 7 
is independent of the number of discs. Note also that, as can be proved, the transitive 
closure algorithm does not detect either of these reachability conditions, even when 
S = H{s, k) for an arbitrarily large constant k? We emphasize again that our new 
algorithm is fully domain-independent. 

Our algorithm not only returns pairs of states, but also returns, for each state pair 
(s, t), a succinct representation of a plan from s to t, as in [16]. Note that our algorithm 
may discover pairs (s, t) for which the shortest plan from s to f is of exponential length, 
when measured in terms of the original operators, as in the Towers of Hanoi domain. 

Towards a tractability theory of domain-independent planning. Many of the bench- 
mark domains-such as Blocksworld-arm, Gripper, and Logistics-can now be handled 
effectively and simultaneously by domain-independent planners, as borne out by em- 
pirical evidence [14]. This empirically observed domain-independent tractability of 
many common benchmark domains naturally calls for a theoretical explanation. By a 
theoretical explanation, we mean the formal definition of tractable classes of planning 
instances, and formal proofs that domains of interest fall into the classes. Clearly, such 
an explanation could bring to the fore structural properties shared by these benchmark 
domains. 

To the best of our knowledge, research proposing tractable classes has generally 
had other foci, such as understanding syntactic restrictions on the operator set [5, 1, 8], 
studying restrictions of the causal graph, as in [3, 4, 1 1, 16], or empirical evaluation of 
simplification rules [10]. Aligned with the present aims is the work of Hoffmann [13] 
that gives proofs that certain benchmark domains are solvable by local search with 
respect to various heuristics. 

^The Hamming distance between two states is defined as the number of variables at wliich tliey differ. 

'in tlie case of Towers of Hanoi, this follows immediately from the known exponential lower bound on 
the length of a plan transforming the initial state to the goal state. For a fixed A; > 1, when given the initial 
state and H(s, k), the transitive closure algorithm "stays within the set" H{s, k), which is of polynomial 
0{n'') size, and wiU not discover pairs {v, v') which are not linked by polynomial length plans. 
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To demonstrate the efficacy of our algorithm, we use it to extend previously de- 
fined tractable classes. In particular, previous work [6] presented a complexity mea- 
sure called persistent Hamming width (PH width), and demonstrated that any set of 
instances having bounded PH width-PH width k for some constant fc-is polynomial- 
time tractable. It was shown that both the Gripper and Logistics domains have bounded 
PH width, giving a uniform explanation for their tractability. In the present paper, 
we show that an extension of this measure yields a tractable class containing both 
the Blocksworld-arm and Towers of Hanoi domains, and we therefore obtain a single 
tractable class which captures all four of these domains. As mentioned, we believe 
that this is significant as theoretical treatments have generally had limited coverage of 
construction-type domains such as Blocksworld-arm and Towers of Hanoi. 

We want to emphasize that our objective here is not to simply establish tractability 
of the domains under discussion: in them, plan generation is already well-known to be 
tractable on an individual, domain-dependent basis. Rather, our objective is to give a 
uniform, domain-independent ex^XwdAxon for the tractability of these domains. Neither 
is our goal to prove that these domains have low time complexity; again, our primary 
goal is to present a simple, domain-independent algorithm for which we can establish 
tractability of these domains with respect to the heavily- studied and mathematically 
robust concept of polynomial time. 

Previous work on macros. Macros have long been studied in planning [9]. Early 
work includes [19], which developed filtering algorithms for discovered macros, and 
[18], which demonstrated the ability of macros to exponentially reduce the size of the 
search space. 

Macros have been thoroughly applied in domain-specific scenarios such as puzzles 
and other games. To name some examples, there has been work on the sUding tile 
puzzle [15], Sokoban [17], and Rubik's cube [12]. 

Some recent research on integrating macros into domain-independent planning sys- 
tems is as follows. Macro-FF [2] is an extension of FF that has the ability to automati- 
cally learn and make use of macro-actions. Marvin [7] is a heuristic search planner that 
can form so-called macro-actions upon escaping from plateaus that can be reused for 
future escapes. Both of these planners participated in the International Plarming Com- 
petition (IPC). A method for learning macros given an arbitrary planner and example 
problems from a domain is given in [20]. 

A more theoretical approach was taken by [16], who studied the use of macros in 
conjunction with causal graphs. This work gives tractability results, and in particular 
shows that domain-independent planners can cope with exponentially long plans in 
polynomial time, which is also a feature of the present work. 

The use of macros in this paper contrasts with that of most works in that macros 
are generated and applied not over a domain or even over an instance, but with respect 
to a "current state" s and a (small) set of related states S. This ensures that the macros 
generated are tailored to the state set S, and no filtering due to over-generation of 
macros is necessary. 
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2 Preliminaries 



An instance of the planning problem is a tuple 11 = {V, init, goal, A) whose compo- 
nents are described as follows. 

• V is a finite set of variables, where each variable v E V has an associated finite 
domain D{v). Note that variables are not necessarily propositional, that is, D{v) 
may have any finite size. A state is a mapping s defined on the variables V such 
that s{v) € D{v) for all v G V. A partial state is a mapping p defined on 
a subset vars(p) of the variables V such that for all v G vars(p), it holds that 
p{v) e D{v). 

• init is a state called the initial state. 

• goal is a partial state. 

• A is a set of actions. An action a consists of a precondition pre(a), which is a 
partial state, as well as a postcondition post(a), also a partial state. We some- 
times denote an action a by (pre(a); post(a)). 

Note that when s is a state or partial state, and is a subset of the variable set V, we 
will use (s \ W) to denote the partial state resulting from restricting s to W. We say 
that a state s is a goal state if (s |" vars(goal)) = goal. 

We say that an action a is applicable at a state s if (s \ vars(pre(a))) = pre (a). 
We define a plan to be a sequence of actions P = ai, . . . , a„. We will always speak 
of actions and plans relative to some planning instance If = {V. init, goal. A), but we 
want to emphasize that when speaking (for example) of an action, the action need not 
be an element of A; we require only that its precondition and postcondition are partial 
states over 11. 

Starting from a state s, we define the state resulting from s by applying a plan P, 
denoted by s[P\, inductively as follows. For the empty plan P = e, we define s[e] = s. 
For non-empty plans P, denoting P = P' ,a,we define s[P', a] as follows. 

• If a is applicable at s[P'], then s[P' , a] is the state equal to post (a) on variables 
V € vars(post(a)), and equal to s[P'] on variables v €V \ vars(post(a)). 

• Otherwise, s[P',a] = s[P']. 

We say that a state s is reachable (in an instance IT) if there exists a plan P such that 
s = init[P]. We are concerned with the problem of plan generation: given an instance 
n = {y, init, goal, A) obtain a plan P that solves it, that is, a plan P such that init[P] 
is a goal state. 

Note that sometimes we will use the representation of a partial function / as the 
relation {(a, h) : f{a) = b}. 

3 Macro Computation Algorithm 

In this section, we develop our macro computation algorithm. This algorithm makes 
use of a number of algorithmic subroutines. In particular, we will present the two 
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macro-producing operations discussed in the introduction, apply and transitive. First, 
we define the notion of action graph, the data structure on which these operations work. 



Definition 1 An action graph is a directed graph G whose vertex set, denoted by V{G), 
is a set of states, and whose edge set, denoted by E{G), consists of labelled edges that 
are actions; we denote the label of an edge e by lc{e) (or 1(e) when G is clear from 
context). Note that for every ordered pair of vertices (,s, s'), there may be at most one 
edge (s, s') in E(G),^ and each edge has exactly one label. 

We now define three functions which will themselves be used as subroutines in 
apply and transitive. 

Definition 2 We define the algorithmic function better(a, (s, s'), G) as follows. Type- 
wise, the function better(a, (s, s'), G) requires that a is an action, G is an action graph, 
and s and s' are vertices in G. The pseudocode for better(a, (s, s'), G) is as follows: 

better (a, (s, s' ) , G) returns boolean 

{ 

if ( {s, s' ) not in E (G) ) 
return TRUE; 

if(pre(a) strictly contained in pre(l(s, s' ) ) AND 
post (a) contained in post(l(s, s' ) } ) 
return TRUE; 

if (pre (a) contained in pre{l(s, s'}) AND 

post (a) strictly contained in post{l{s, s'Mt 
return TRUE; 

return FftLSE; 

} 

Definition 3 We define the algorithmic fiinction addlabel(G, s, 5', a) as follows. Type- 
wise, the function addlabel(G, s, s\ a) requires that G is an action graph, s and s' 
are vertices in G, and a is an action. The pseudocode for addlabel(G, s, s\ a) is as 
follows: 

addlabel (G, s, s', a) returns G' 
{ 

G' := G; 

if ( (s, s' ) not in E (G) ) 

{ 

place {s, 5' ) in E (G' ) ; 

) 

1_{G' ) (s, s' ) a; 
return G' ; 

) 

We remark that in our pseudocode, the assignment operator := is intended to be a 
value copy (as opposed to a reference copy, as in some programming languages). 

Definition 4 We define the algorithmic function combine(a, a') as follows. Type-wise, 
the function combine(a, a') requires that a and a' are actions. We remark that in all 

cases where we use the function combine(a, a'), there will exist states si, S2 such that 
a is applicable at state Si, si [a] = S2, and a' is applicable at state S2. The pseudocode 
for combine(a, a') is as follows: 

'^That is, an action graph is not a multigraph. 



6 



confine (a, a' | returns action a'' 
{ 

R :- vars(pre(a) ) setmlnjs vara (post (a) ) ; 

s ;=post(a) union (pre (a) | R) ; 

;= vars (post (a) ) setminus vars (post (a' ) ) ; 

pr := pre (a) union (pre (a' ) - s) ; 

pos ;= post (a') union (post (a) I 0) ; 

return <pr; pos setminus pr>; 

) 

Here, the pipe symbol \ should be interpreted as function restriction, and the sub- 
traction symbol in (pre(a') — s) should be interpreted as a set difference, where the 
partial functions pre(a') and S are viewed as relations. Intuitively, the partial state s 
represents what we know about a state if all we are told is that the action a has just 
been successfully executed. 

The following propositions identify key properties of the combine function. 

Proposition 5 Let a, a' be actions and let s be a state. The action combine(a, a') is 
applicable at s if and only if a is applicable at s and a' is applicable at s[a]. When this 
occurs, s[combine(a, a')] is equal to s[a, a']. 

Proposition6 The function comh'mQ is associative. Thatis, ?/jeacrioncombine(combine(ai,a2),a3) 

is equal to the action combine(ai, combine(a2, as)), assuming that there exists a state 
s such that ai is applicable in s, 02 is applicable in s[ai], and is applicable in 
s[ai,a2]. 

We may now define the promised macro-producing operations. 

Definition 7 We define two algorithmic functions app\)/{G , A, a, s) a«dtransitive(G, Si, S2, S3). 
Type-wise, the function apply(G, A, a, s) requires that G is an action graph, A is a set 
of actions, a is an action, and s is a vertex ofG. The pseudocode for apply(G, A, a, s) 
is as follows: 

apply (G, A, a, s) returns G' 
{ 

G' := G; 

if ( a in A OR a appears as a label in G' ) { 
if ( s [a] !- s AND s [a] in V(G) ) ( 
if ( better(a, (s, 3[al), G) ( 
G' := addlabel(G, s, s[a], a); 

1 

1 

) 

return G' ; 

) 

Type-wise, the function transitive(G, si, S2, S3) requires that G is an action graph, 
and that si, S2, and S3 are vertices in G. The pseudocode for transitive(G, si, S2, S3) 

is as follows. 

transitive (G, s_l, s_2, s_3) return G' 
{ 

G' := G; 

if ( (s_l, s_2) in E(G) and 
(s_2, s_3) in E(G) ) { 
a :- 1 (s_l, s_2) ; 
a' := 1 (s_2, s_3) ; 
a'' := coiTibine(a, a'); 
if( better (a", (s_l, s_3), G) { 
G' := addlabel(G, s_l, s_3, a"); 

) 

1 

return G' ; 
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Within the function transitive, in the case that the addlabel/Mncf/on is called and 
returns a graph G' that is different from the input graph G, we say that the transition 
{si,a",ss) (where si,ss,a" are the arguments passed to the add\ahe\ function) is 
produced by the function. 

In general, we use the term transition to refer to a triple (s, a, s') consisting of 
states s, s' and an action a such that a is applicable at s and s[a] = s'. 

Definition 8 An action graph program over a set of states S and a set of actions A is a 
sequence of commands S = ai, . . . , fT„ of the form apply(G, A, a, s), with s e S, or 
transitive(G', si, S2, S3), with Si, $2, S3 G S. The execution of an action graph program 
takes place as follows. First, G is initialized to be the action graph with S as vertices 
and no edges. Then, the commands of S are executed in order; for each i, after Ui is 
executed, G is replaced with the returned value. 

The following is our macro computation algorithm. As input, it takes a set of states 
S and a set of actions A. The running time can be bounded by ©(nl^l^dAI + l/S]^)), 
where n denotes the number of variables. 

compute_macros (S, A) returns G, M 
1 

M ; = empty ; 

V(G) := S; 

E(G) := empty set; 

do { 

A' := (A union 1(E(G) ) ) ; 
for all: a in A' , s in V(G) ( 
G ;= apply (G, A, a, s) ; 

) 

for all si, s2, s3 in V(G) { 
G ;= transitive (G, si, s2, s3) ; 
if (transitive produces a transition) { 

append ''l(sl, s3) = l(sl, s2), l(s2, s3)" to M; 

) 

1 

) 

while (some change was made to G) 
return (G, M) ; 

) 

Understanding compute-macros. By a combination over A, we mean an action in 
A or an action that can be derived from actions in A by (possibly multiple) apphcations 
of the combine function. 

Definition 9 We say that a transition (s, a, s') is condition-minimal with respect to a 
set of actions A if for any combination a' over A, if s[a'] = s' then pre(a) C pre(a') 
and post(a) C post(a') (when pre(a), pre(a'), post(a), and post(a') are viewed as 
relations). 

Having defined the notion of a condition-minimal transition, we can now naturally 
define the notion of a condition-minimal program. 

Definition 10 Relative to a planning instance 11, let S be a set of states, and let A, 
A' be sets of actions. An A-condition-minimal-program (/or i/;orf, A-CM-program) 
over states S and actions A' is an action graph program over S and A such that 
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when executed, apply is only passed pairs («,.s) such that (,s,a, s[a]) is condition- 
minimal with respect to A, and the transitive commands produce only transitions that 
are condition-minimal with respect to A. 

We now define a notion of derivable action. This notion is defined recursively. 
Roughly speaking, derivable actions are actions that wiU provably be discovered as 
macros by the algorithm. 

Definition 11 Relative to a planning instance II, let S be a set of states, and let Abe a 
set of actions. We define the set of{S, A)-derivable actions recursively, as the smallest 
set satisfying: any action of a transition produced by an A-CM-program over states S 
and the set of actions that are {S, A)-derivable or in A, is {S, A)-derivable. 

Lemma 12 Relative to a planning instance 11 with action set A, let s be a state. Any 
{H{s, k), A)-derivable action is discovered by a call to the function compute_macros 
with the first two arguments H{s,k) and A, by which we mean that any such an action 
will appear as an edge label in the graph output by compute_macros. 

We emphasize that, in the compute_macros procedure, labels of edges are 
merely actions, which (as defined) are precondition-postcondition pairs that need not 
appear in the original set of actions A. When new edge labels are introduced, they are 
always obtained from existing labels or from A via the combine procedure, which 
permits the general applicability of edge labels. 

Proof (Sketch). Let E = cti, . . . , (j„ be an A-CM-program over H{s, k) and ac- 
tions that are discovered by compute_macros, and let H be the graph returned by 
compute_macros; we prove the result by induction. 

We consider the execution of the program S with graph G. We prove by induction 
on i > 1 that after the command CTj is executed and returns graph Gj, for every edge 
(s,s') G £;(G,), it holds that (.s, s') e E(H) and ?g.(s,s') = Ih{s,s'). 

If cTj is an apply command (with arguments s and a) that effects a change in the 
graph, then the input action must be in l{E{Gi)). The command trj can be successfully 
applied at H. Since is a fixed point over all apply and transitive commands, the 
action a passed to apply or one that is better (according to the function better) 
must appear in at Ih{s, s\a\). By condition-minimaUty of (s, a, s[a\), we have that 
a = Ih{s, s[a]). 

If CTj is a transitive command that produces a transition (s, a, s'), then the actions 
a' and a" (from within the execution of the command), by induction hypothesis, ap- 
pear in H. Since _ff is a fixed point over all apply and transitive commands, the ac- 
tion combine(a, a!) or one that is better must appear in H at Ih{s, s'). By condition- 
minimality of (s, combine(a, a'), s'), we have that combine(a, a') = Ih{s, s'). □ 

4 Examples 

Blocksworld-arm. We will present results with respect to the following formulation 
of the Blocksworld-arm domain, which is based strongly on the propositional STRIPS 
formulation. We choose this formulation primarily to lighten the presentation, and 
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remark that it is straightforward to verify that our proofs and results apply to the propo- 
sitional formulation. 

Domain 13 (Blocksworld-arm domain) We use a formulation of this domain where 
there is an arm. Formally, in an instance 11 = (V, in it, goal. A) of the Blocksworld-arm 
domain, there is a set of blocks B, and the variable set V is defined as {arm} U {b-on : 
b <E B} U {fe-clear : b e B} where L'(arm) = {empty} U B and for all b e B, 
D{b-on) = {table, arm} U B and _D(6-clear) = {T, F}. The 6-on variable tells what 
the block b is on top of, or whether it is being held by the arm, and the 6-clear variable 
tells whether or not the block b is clear. 
There are four kinds of actions. 

• \/b & B, pickup;, = (6-clear = T, 5-on = table, arm = empty; 6-clear = 
F, 6-on = arm, arm = b) 

• V6 e -B, putdowPj = (arm = 6; arm = empty, 6-clear = T, 6-on = table) 

• V6, c e B, unstackft^c = (6-clear = T, 6-on = c, arm = empty; 6-clear = 
F, 6-on = arm, arm = 6, c-clear = T) 

• V6, c G B, stackb^c = (arm = 6, c-clear = T;arm = empty, c-clear 
f . 6-clear = T, 6-on = c) 

□ 

Deiuiition 14 Relative to an instance 11 of Blocksworld-arm and a reachable state s 
o/n, a pile P of s is a non-empty sequence of blocks (6i, . . . , 6^) such that sibi-on) = 
6j+i for all i € [1, fc — 1]. The top of the pile P is the block top(P) = 6i, and the 
bottom of the pile is the block bottom(P) = bk- The size ofP is \P\ = k. 

A sub-tower of a is a pile P such that s(top(P)-clear) = T; a tower is a sub-tower 
such that s(bottom(P)-on) = table. 

We use the notation P>{b) (respectively, P>{b), P<{b), P<{b)) to denote the sub- 
tower with bottom block b ( respectively, the sub-tower stacked on b, and the piles sup- 
porting b, either including b or not.) 

Definition 15 Let II be a planning instance of Blocksworld-arm. Let P = (6i, . . . , bk) 
be a sequence of blocks, and 6 and b' two different blocks not in P. Let S be the partial 
state {6i-clear = T, arm = empty, 6i-on = 62, . . . , bk-i-on = bk}. We define several 
actions with S as common precondition. 

• The action subtow-tablep^b = {S,bk-on = 6; 6fe-on = table, 6-clear = T) 

moves a sub-tower P from a block b to the table. 

• The action subtow-hlockp^h^i)' = {S, 6/5 -on = 6, 6'-clear = T; 6;s-on = 6', 6-clear 
T. 6'-clear = F) moves a sub-tower Pfrom a block 6 onto a block b'. 

• The actiontow-blockpb' — {S, 6fc-on = table, 6'-clear = T; 6;s-on = 6', 6'-clear = 
F) moves a tower P onto a block b'. 
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Theorem 16 Let Hbe a planning instance of Blocksworld-arm, and let she a reach- 
able state with s(arm) — empty. 

• If P is a sub-tower of s and s(6fc-on) = h, then subtow-tablep,6 is {H{s, 4), A)- 
derivable. 

• If P is a sub-tower of s, s{bk-or\) = b and s{b' -dear) = T, //zen subtow-blockp_(,^(,/ 
is {H[s, 5), A)-derivable. 

• IfP is a tower of s, s(6fe-on) = table and s(6'-clear) = T, then tow-blockp_6' is 
{H{s, 4), A)-derivable. 

Proof (Sketch). The proof has two parts. First, we show that the aforementioned 

actions are condition-minimal. Then, we describe how to obtain an yl-CM-program 
that produces the actions inside if (s, 5). We consider the case a = subtow-blockp^b,^' ; 
the remaining actions admit similar proofs that only require Hanrniing distance 4. 

To prove condition-minimality of action a we consider any combination C = 
(ai, . . . , Of) of primitive actions from A such that s[C\ = s[a]. We must show that 
the actions unstackftj^b^, . . . , unstackb^^b,stackbj^^fe' appear in C in the given relative 
order, and that no matter what are the remaining actions of C, this already implies that 
pre(a) C pre(C) and post(a) C post(C). We remark that the proof is not straight- 
forward, since pre(C) and post(C) are the result of applying the combine subroutine 
to several actions not yet determined. 

To prove that there exists an A-CM-program that produces actions subtow-table 
and tow-block inside H{s, 4) we use a mutual induction; we omit the proof here. We 
then use these results for subtow-block, the proof for which we sketch here. Precisely, 
we now show that subtow-blockp,(,,b' is {H{s, 5), A)-derivable. 

When |P| = 1, we derive subtow-blockp^b^fc/ by combining actions ai = unstack(,j^(, 
and a2 = stack^j f,/. The states s[ai] and s[ai,a2] differ from s respectively 4 and 
3 variables, so both states lie inside H{s,5). When |P| = k, let P' = P>{bk) in 
state s. We use the derivable actions ai = subtow-tablep/^;,^, 02 = unstack^^^ 
as = stackftj. and 04 = tow-blockp/ j,^. It is easy to check that the state s[ai, 02, 03] 
is the one that is furthest from s, differing at the 5 variables 6-clear, 6fe_i-on, fe^-clear, 
6fc-on and 6'-clear. □ 

Towers of Hanoi. We study the formulation of Towers of Hanoi where, for every 
disk d, a variable stores the position (that is, the disk or the peg) the disk d is on. 
Formally, in an instance 11 = (y, init, goal, A) of the Towers of Hanoi domain, there is 
an ordered set of disks D = {di , . . . , dj } and a partially ordered set of positions P = 
D U {pi,P2,P3}, where di < pj for every i and j. The set of variables V is defined as 
{d-on : d € D}U {x-clear : x e P}, where D{d-on) = P and D(a;-clear) = {T, F}. 

The only actions in Towers of Hanoi are movement actions that move a disk d into 
a position x, provided that both d and p are clear and d < x. 

• yd € D, Vx, x' £ P,ifd < x, then define moved,a:',a; = (rf-clear = T, x-clear = 
T, rf-on = x'; a;-clear = F, a;'-clear = T, d-on = x) 
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We define this planning domain as the set of those planning instances 11 such that 
the in it and goal are certain predetermined total states. Namely, in both states in it and 
goal it holds rfj-on = for alH e [l,...,k— 1], di-clear = T, dj-clear = F for 
aU i G [2, k] and p2-clear = T. They only differ in three variables: init((ife-on) = pi, 
init(pi-clear) = /aise and init(f>3-clear) = T, butgoal(dfe-on) = pa, goal(pi-clear) = 
T and goal(p3-clear) = F. 

Definition 17 Let H be a planning domain instance of Towers of Hanoi. Let i be 
an integer i S [l,fc]. Let x = m\X{di-on) and x' G {P2,P3}- We define the ac- 
tion subtow-'pos^ ^ = (di-clear = T, rfi-on = (i2, • • • , rfi-i-on = di,di-OT\ = 
X, a;'-clear = T; dj-on = x', x-clear = T, x'-clear = F), that is, the action that moves 
the tower of depth ifrom x to x' . 

Theorem 18 The actions subtow-poSj j., are (init, 7), A)-derivable. 

We prove this by induction on i, the height of the subtower. To derive actions of 
the form subtow-pos^_|_i ^ ^, from the actions of the form subtow-pos^ ^ we make 
use of the classical recursive solution to Towers of Hanoi; an analysis shows that this 
recursive step stays within Hamming distance 7 of the initial state. 

5 Width 

In this section, we present the definition of macro persistent Hamming width and 
present the width results on domains. For a state s, we define wrong(s) to be the 
variables that are not in the goal state, that is, wrong(s) = {v £ vars(goal) | s{v) ^ 
goa\{v)}. 

Definition 19 With respect to a planning instance (V, init, goal, A), we say that a state 

s' is an improvement of a state s if 

• for all V G V, ifv G vars(goal) and s{v) = goal(?;), then s'{v) = goal(u); and, 

• there exists u G vars(goal) such thatu G wrong(s) and s'{u) = goal(w). 
In this case, we say that such a variable u is a variable being improved. 

Definition 20 With respect to a planning instance {V, init, goal, A), we say that a plan 
P improves a state s ifs[P] is a goal state, or s[P] is an improvement of s. 

Relative to a planning instance, we say that a state s dominates another state s' if 
{f G V : s{v) ^ s'{v)} C vars(goal) and wrong(s) C wrong(s'); intuitively, s' may 
differ from s only in that it may have more variables set to their goal position. Recall 
that for a state .s and natural number fc > 0, we use H{s, k) to denote the set of all 
states within Hamming distance k from s. 

We now give the official definition of our new width notion. 
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Definition 21 A planning instance (V, init, goal, A) has macro persistent Hamming 
width k (for short, MPH width k) if no plan exists, or for every reachable state s 
dominating the initial state init, there exists a plan over {H{s, k), A)-derivable actions 
improving s that stays within Hamming distance k ofs. 

It is straightforwardly verified that if an instance has PH width k, then it has MPH 
width k. 

We now give a polynomial-time algorithm for sets of planning instances having 
bounded MPH width. We establish the following theorem. 

Theorem 22 Let C be a set of planning instances having MPH width k. The plan 
generation problem for C is solvable in polynomial time via the following algorithm, in 
time 0{n^''~^^d'^''{a + (nd)^'' ) ). Here, n denotes the number of variables, d denotes 
the maximum size of a domain, and a denotes the number of actions. 

solve_mph ( (V, init, goal, ft), k) 

{ 

Q := empty plan; 

M := empty set of macros; 

s := init; 

while ( s not a goal state ) { 

(G, M' ) := compute_macros {H (s, k} , A); 
append M' to M; 

if (an improvement s' of s is reachable from s in Gt { 
s : = s ' ; 

) 

else { 
halt; 

) 

append 1 (s, s' ) to Q; 

) 

print M; 
print Q; 

) 

Proof (Sketcli). Let 11 G C be a planning instance such that there exists a plan for 
n = iy, init, goal, A). We want to show that solve_mph outputs a plan. During 
the execution of solve_mph, the state s can only be replaced by states that are im- 
provements of it, and thus s always dominates the initial state init. By definition of 
MPH width, then, for any s encountered during execution, there exists a plan over 
{H{s, k), A)-derivable actions improving s staying within Hamming distance k of s. 
By Lemma 12, all of the actions are discovered by compute_macros, and thus the 
reachability check in solve_mph will find an improvement. 

We now perform a running time analysis of the algorithm. Let v denote the number 
of vertices in the graphs in compute_macros, that is, \H{s,k)\. We have v < 
(^)d'^ G 0{{nd)''). Let e be the maximum number of edges; we have e = (2) G 
0((nd)^'^). The do-while loop in compute_macros will execute at most 2n ■ e G 
0{ne) times, since once an edge is introduced, its label may change at most 2n times, 
by definition of better. Each time this loop iterates, it uses no more than (a + e)v + 
time: apply can be called on no more than (a -j- e)v inputs, and transitive can be called 
on no more than inputs. The while loop in solve_mph loops at most n times, and 
each time, by the previous discussion, it requires ne((a + e)v + v^) time for the call 
to compute_macros, and [v + e) time for the reachability check. The total time is 
thus 0{n{ne{{a + e)v + v^) + (v + e))) which is 0{n?e{{a + ejv + v^)) which is 
0{n^e{a + e)v) which is 0{n^''+^d^''{a + (ndf'')). □ 
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Blocksworld. 



Theorem 23 All instances of the Blocksworld-arm domain have MPH-width 10. 

According to Theorem 16, at any state s we may consider our set of applicable 
actions enriched by this new macro-actions. We now show how can these new actions 
be used to improve any reachable state s. The proof is conceptually simple: improve 
s just by moving around a few piles of blocks. For instance, if s(6-on) = h' but 
goal(6-on) = b" , apply actions subtow-tablep^ ((,//) _(,//, subtow-blockp^((,)_(,/_(,//. How- 
ever, we must not forget that variables that were already in the goal state in s must 
remain so after the improvement. For instance, if h was on top of h' in ,s, then un- 
stacking h from h' will make fo'-clear change from F to T. We may try to solve this by 
placing anything whatever on top of 6', but then this movement may affect some other 
variable which was already in the goal state, and so forth. 

The following lemma is a case-by-case analysis of the solution to the difficulty we 
have described. 

Lemma 24 Let 11 be an instance of the Blocksworld-arm domain, and let she a reach- 
able state o/n such that .s(arm) = empty. If a block b is such that s(&-clear) = T 
but goal(&-clear) = F, then there is a plan using {H{s, 6), A)-derivable actions that 
improves the variable 6-clear in s. 

Proof (Sketch). Clearly, b ~ top(Pi) for some tower Pi of s. Let P2, . . . , Pt be the 
remaining t — 1 towers of s, and let t' be the number of towers of goal. 

The proof proceeds by cases. If there is i such that goal(bottom(Pj)-on) ^ table, 
we say we are in Case 1. Otherwise, it holds that t < t'. In particular, there are t' 
blocks b' such that goal(&'-clear) = T (block b not one of them), and t blocks b' ^b 
such that s(6'-clear) = T (block b being one of them). It follows that it exists a block 
b' such that goal(6'-clear) = T but s(&'-clear) = F. We say we are in Case 2 if the 
block b' belongs to the tower Pi, and in Case 3 if not. Throughout this proof we say 
that a block b' is badly placed if s(6'-on) ^ goal(6'-on). 

Case 1. The tower Pj is wrongly placed in the table, so we are allowed to change 
the value of bottom (Pi)-on without worry. 

(a) then use tow-blockp.^;, to stack the tower Pj on top of b. 

(b) If i = 1 and a tower P, with j > 1 has a badly placed block 6', then a possible 
solution is to insert Pi below b'. That is, move the sub-tower P>{b') on top of 
Pi, and then move the new resulting tower on top of the place where b' was in 
state s, that is, on top of s(6'-on). 

(c) If i = 1 and no tower Pj with i > 1 has badly placed blocks., then consider 
the pile P^' in state goal that b belongs to, and let b' ~ T(P'). If block b' is in 
Pj for j > 1 in state s, then Pj would have some badly placed block, since b' 
and b, sharing pile P/ in the goal state, would be in different piles in state s. So 
b' is in Pi, goal(6'-clear) = T but .s(&'-clear) = F, since b is the top of Pi. It 
follows that the block on top of b' in pile Pi is badly placed. To improve 6-clear 
use actions subtow-tablep^((,/)^(,/ and tow-blockp^(fe/)_6, that is, break the tower 
over block b' and swap the two parts. 
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Note that an action like tow-blockp^(b/) 5 is not derivable from s since the pile 
P<{b') is not a subtower of s, but it is derivable from s' = s[subtow-tablep^ ((,/)_(,/], a 
state within distance 2 from s. This fact may increase the width required to discover the 
derivable actions. In our case, a careful examination reveals that Situation (b) requires 
width 5 and Situation (c) requires width 4. 

Case 2. Note that if Case 1 does not apply then t < t'. Let h' be the highest block 
in Pi such that s(6'-clear) = F but goal(6'-clear) = T. 

(a) If i > 1 and a tower Pj with j > 1 has a badly placed block h" , then we insert 
the pile P>{b') below b", analogously to Situation (b) in Case 1. This procedure 
improves variables 6-clear and 5'-clear at the same time, but it needs width 6. 

(b) If there is a second block b" in Pi such that goal(6"-clear) = T, then swap the 
sub-tower P>{b') with the pile between b' and b", the block b" not including. 
The procedure is similar to Situation (c) in Case 1, but it requires width 5. 

(c) If there is no second block b" in Pi but all the towers Pj with j > 1 have no 
badly placed blocks, it follows that either t = 1 or all towers Pj with j > 1 
are exactly as in the goal state. Observe that, in this situation, the blocks of Pi 
form a tower in s and in goal, but the order of the blocks in the two towers must 
differ: the pile P' = P<(6'), which is such that goal(top(P')-clear) = T and 
goal(bottom(P')-on) = table, cannot be a pile in goal. Hence there is a badly 
placed block below b'. This situation is analogous to Situation (b) in Case 2, and 
it also requires width 5. 

Case 3. There is a block b' such that s(6'-clear) = F but goal(&'-clear) = T, and 
the block is in some tower Pj other than Pi . We just stack the sub-tower P> (6') on top 
of 6. □ 

Proof (Sketch), (of Theorem 23) Let n be an instance of the Blocksworld-arm domain, 
and let s be a reachable state of 11 that is not a goal state. We present the case where 
s(arm) = goal(arm) = empty. 
Improving b-or\. 

• s(b-on) = table, goal(6-on) = //. If .s(//-clear) = F, then move the sub-tower 
P>{b') onto the table. (This changes the variable fe"-on, where b" is the block 
on top of b' in s, which was not in the goal state in s.) Now the block b' is clear, 
so we stack the tower b is the bottom of onto b'. 

• s(6-on) = 6",goal(6-on) = b'. If s(6'-clear) = F then we can swap piles 
P> (&") and P> (&'). Otherwise, we stack P> (&") on top of but then 6"-clear 
becomes true. This is a problem if goal(&"-clear) = F, so we may need to apply 
Lenmia 24 at the current state. Again, a careful examination shows that we may 
need width 8. 

• s(6-on) = 6", goal(6-on) = table. Move P>{b) onto the table. As in the previ- 
ous case apply Lemma 24 to the current state if goal(6"-clear) = F. In this case 
we may need width 7. 
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Improving 6-clear. 

• s(6-clear) = F, goal(6-clear) = T. Move the pile P>{b) onto the table, so width 
4 is enough. 

• s(6-clear) = T, goal(6-clear) = F. Just apply Lemma 24, which requires width 
6. 

Under the assumption that s(arm) = goal(arm) = empty, there is nothing else to 
show, since we have explained how to improve any variable. The width number 10 
comes from the analysis of the other cases. □ 

Towers of Hanoi. 

Tlieorem 25 All instances of the Towers of Hanoi domain have MPH-width 7. 

Each instance can be solved by a single appUcation of the action subtow-pos^ . 
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